Word Clustering Using Word Embedding Generated by Neural Net-based Skip Gram
نویسندگان
چکیده
This paper proposes word clustering using word embedding. We used a neural net-based continuous skip-gram method for generating word embedding in continuous space. The proposed word clustering method represents each word in the vector space using a neural network. The K-means clustering method partitions word embedding into predetermined K-word
منابع مشابه
Learning Word Representation Considering Proximity and Ambiguity
Distributed representations of words (aka word embedding) have proven helpful in solving natural language processing (NLP) tasks. Training distributed representations of words with neural networks has lately been a major focus of researchers in the field. Recent work on word embedding, the Continuous Bag-of-Words (CBOW) model and the Continuous Skip-gram (Skip-gram) model, have produced particu...
متن کاملRevisiting Skip-Gram Negative Sampling Model with Regularization
We revisit skip-gram negative sampling (SGNS), a popular neural-network based approach to learning distributed word representation. We first point out the ambiguity issue undermining the SGNSmodel, in the sense that the word vectors can be entirely distorted without changing the objective value. To resolve this issue, we rectify the SGNSmodel with quadratic regularization. A theoretical justifi...
متن کاملA Simple Word Embedding Model for Lexical Substitution
The lexical substitution task requires identifying meaning-preserving substitutes for a target word instance in a given sentential context. Since its introduction in SemEval-2007, various models addressed this challenge, mostly in an unsupervised setting. In this work we propose a simple model for lexical substitution, which is based on the popular skip-gram word embedding model. The novelty of...
متن کاملA Syllable-based Technique for Word Embeddings of Korean Words
Word embedding has become a fundamental component to many NLP tasks such as named entity recognition and machine translation. However, popular models that learn such embeddings are unaware of the morphology of words, so it is not directly applicable to highly agglutinative languages such as Korean. We propose a syllable-based learning model for Korean using a convolutional neural network, in wh...
متن کاملWEMOTE - Word Embedding based Minority Oversampling Technique for Imbalanced Emotion and Sentiment Classification
Imbalanced training data always puzzles the supervised learning based emotion and sentiment classification. Several existing research showed that data sparseness and small disjuncts are the two major factors affecting the classification. Target to these two problems, this paper presents a word embedding based oversampling method. Firstly, a large-scale text corpus is used to train a continuous ...
متن کامل